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Bacillus thuringiensis represents one of the six species of "Bacillus cereus group" in the genus Bacillus 
within the family Bacillaceae. Strain SbtOOB was isolated from soil and identified as B. thuringiensis. 
It haibors at least seven plasmids and produces three shapes of parasporal crystals including oval, 
bipyramidal and rice. SDS-PAGE analysis of spore-crystal suspension of this strain reveals six major 
protein bands, which implies the presence of multiple parasporal crystal genes. Bioassay of this strain 
reveals that it shows specific activity against nematodes and human cancer cells. In this study, we 
report the whole genomic shotgun sequences of SbtOOB. The high-quality draft of the genome is 
6,175,670 bp long (including chromosome and plasmids) with 6,372 protein-coding and 80 RNA 
genes. 



Introduction 

Bacillus thuringiensis, B. cereus, B. anthracis and 
other three species constitute the "Bacillus cereus 
group", a nontaxonomic term, within the genus 
Bacillus and family Bacillaceae [1]. These species 
were classified as separate species mainly based 
on their distinct phenotypes, although extensive 
genomic studies on strains of these species using 
different techniques have suggested that they 
form a single species [2-5]. Strain Sbt003 belongs 
to the species B. thuringiensis. The type strain of 
the species produces one or more parasporal crys- 
tal proteins showing specific activity against cer- 
tain larvae from various orders of insects [6]. The 
specific role and the abundant number of genes 
encoding of insecticidal crystal proteins of this 
species have attracted much attention from both 
academic and industrial researchers. Dozens of B. 
thuringiensis strains have been sequenced, and 
dozens more are on their way. In this study, we 
present a summary classification and a set of fea- 
tures for B. thuringiensis SbtOOS, together with the 
description of the genomic sequencing and anno- 
tation. 

Classification and features 

B. thuringiensis strain SbtOOS harbors at least 7 
plasmids and produces three different shapes of 
parasporal crystals including oval, bipyramidal 
and rice (Figure lA, Figure IB and Table 1). SDS- 
PAGE analysis of spore-crystal suspension of this 



strain reveals six major protein bands of 168.8, 
148.5, 133.5, 117.2, 107.9 and 103.1 kDa, which 
implies the presence of multiple parasporal crys- 
tal genes (Figure IC). 

A representative genomic 16S rDNA sequence of 
strain Sbt003 was searched against GenBank da- 
tabase using BLAST [21]. Sequences showing 
more than 97% identity to the 16S rDNA of 
Sbt003 were selected for phylogentic analysis, and 
a 16S rDNA sequence from B. subtilis subsp. 
subtilis str. 168 was used as the outgroup. Nine 
sequences were aligned with ClustalW algorithm. 
The tree was reconstructed using neighbor joining 
with the Kimura 2-parameter substitution model. 
The phylogenetic tree was assessed by bootstrap- 
ping 1,000 times, and the consensus tree is shown 
in Figure 2. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing due to 
its specific activity against nematodes and human 
cancer cells. The complete high quality draft ge- 
nome sequence is deposited in GenBank. The Bei- 
jing Genomics Institute (BGI] performed the se- 
quencing and NCBI staff used the Prokaryotic Ge- 
nome Automatic Annotation Pipeline (PGAAP) to 
complete the annotation. A summary of the pro- 
ject is given in Table 2. 
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Figure 1. General characteristics of Bacillus thuringiensis Sbt003.(A) Agarose gel electrophoresis of total DNA of 
Sbt003. Lane M, molecular mass standard, Lambda DNA//-//ndlll; Lane 1, Sbt003. (B) Phase contrast micro- 
graph of Sbt003 sporulated culture. (C) SDS-PAGE analysis of crystal proteins of Sbt003. Lane M, molecular 
mass standard; Lane 1, Sbt003. 
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Table 1 . Classification and general features of B. thuringiensis Sbt003 according to the 
MIGS recommendations [7] 



MIGS ID Property 



Term 



Evidence code^ 



MIGS-6 

MIGS-6.3 

MIGS-22 

MIGS-14 

MIGS-4 

MIGS-4.1 

MIGS-4.2 

MIGS-4.3 

MIGS-4.4 

MIGS-5 



Current classification 



Domain Bacteria 
Phylum Firmtutes 
Class Bacilli 
Order Bacillales 
Family Bacillaceae 
Genus Bacillus 



IAS [8] 
TAS [9-1 1] 
TAS [12,13] 
TAS [14,15] 
TAS [14,16] 
TAS [14,17,18] 



Species Bacillus thuringiensis TAS [14,19] 
Type strain HD73 



Gram stain 
Cell shape 
Motility 
Sporulation 

Temperature range 



Gram-positive 

Rod-shaped 

Motile 

Spore-forming 
Room temperature 



Optimum temperature 28°C 

Carbon source Organic carbon source 



Energy source 

Habitat 

Salinity 

Oxygen 

Pathogenicity 

Geographic location 

Latitude 

Longitude 

Depth 

Altitude 



Organic carbon source 
Soil 

Salt tolerant 
Aerobic 
Avirulent 
Hubei, China 
29-31 N 
111-1 14E 
5-1 0cm 
About 35 m 



NAS 

IDA 

NAS 

IDA 

NAS 

IDS 

NAS 

NAS 

IDA 

NAS 

NAS 

NAS 

IDA 



Sample collection time 2000 



IDA 



a) Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., 
a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not di- 
rectly observed for the living, isolated sample, but based on a generally accepted property 
for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology 
project [20]. 
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- Bacillus thuringiensis BMB171 
Bacillus cereus ATCC 14579 
Bacillus thuringiensis SbtOOS 
Bacillus thuringiensis 97-27 
Bacillus anthracis Sterne 
^ Bacillus thuringiensis Al Hakam 
Bacillus anthracis Ames Ancestor 
Bacillus cereus ATCC 10987 
Bacillus subtilis 168 



0.001 

Figure 2. Neighbor-joining phylogenetic tree generated using MEGA 4 based on 1 6S rRNA sequences. The strains and their 
corresponding GenBank accession numbers (and, when applicable, draft sequence coordinates) for 16S rDNA sequences are: 
A, B. thuringiensis serovar l<onkul<ian str. 97-27 tAE017355.1 ): 9337-10763; B, B. thuringiensis BMB1 71 (CP001903): 921 7- 
10643; C, B. subtilis subsp. subtilis str. 168 (NC_000964): 9839-11263; D, 6. cereus ATCC 10987 (NC_003909): 9335- 
1 0761 ; E, 6. anthracis str. 'Ames Ancestor' (NC_007530): 9335-10761 ; F, B. anthracis str. Sterne (NC_00 594 5): 9336-107 62; 
G, 8. thuringiensis str.AI Hakam (NC_008600): 9336-10762; H, B. cereus ATCC 14579 (NC_004 722): 289 56-30382. 



Growth conditions and DNA isolation 

B. thuringiensis Sbt003 was grown in 50 mL Luria 
broth for 6 hours at 28°C. DNA was isolated by in- 
cubating the cells with lysozyme (10 mg/mL] in 2 
mL TE (50 mM Tris base, 10 mM EDTA, 20% su- 
crose, pH8.0) at 4°C for 6 hours. 4 mL 2% SDS was 
added and the mixture was incubated at 55°C for 
30 min; 2 mL 5M NaCl were added, and the mixture 
was incubated at 4°C for 10 min. DNA was purified 
by organic extraction and ethanol precipitation. 

Genome sequencing and assembly 

The genome of B. thuringiensis Sbt003 was se- 
quenced using lUumina Hiseq 2,000 platform (with 
a combination of a 100-bp paired-end reads se- 
quencing from a 500-bp genomic library and a 90- 
bp mate-paired reads sequencing from a 2-kb ge- 
nomic library). Reads with average quality scores 
below Q30 or having more than 3 unidentified nu- 
cleotides were eliminated. Using SOAPdenovo 1.05 
version, 22,295,588 paired-end reads (achieving 
-325 fold coverage [2.01 Gb]) and 11,166,312 ma- 
te-paired reads (achieving ~ 163 fold coverage 
[1.00 Gb]) were assembled de novo [22]. The as- 
sembly is considered a high-guality draft and con- 
sists of 104 contigs arranged in 61 scaffolds with a 
total size of 6,175,670 bp. According to 
bioinformatic analysis, we identified two large 
plasmids belonging to or/44-type and repB-type 
plasmids, respectively. The former plasmid has two 
ori44-type replicons. We propose it represents a 



fusion of two plasmids and its estimated size is 
about 200 kb. The latter plasmid has an expected 
size of at least 90 kb, according to the sequence of 
contig0027, which is typical of repB-type plasmids 
(80 ~ 90 kb). In addition, we identified five other 
plasmids from the plasmid pattern (see Figure lA). 
The expected sizes of the smaller three are 13 kb, 
8kb and 4kb, respectively, while the sizes of the 
larger two can't be deduced either from the plas- 
mid pattern or by bioinformatic analysis. 

Genome annotation 

Genome annotation was completed using the Pro- 
karyotic Genomes Automatic Annotation Pipeline 
(PGAAP). Briefly, protein-coding genes were pre- 
dicted using a combination of GeneMaik and Glim- 
mer [23-25]. Ribosomal RNAs were predicted by 
sequence similarity searching using BLAST against 
an RNA sequence database and/or using Infernal 
and Rfam models [26,27]. Transfer RNAs were pre- 
dicted using tRNAscan-SE [28]. In order to detect 
missing genes, a complete six-frame translation of 
the nucleotide sequence was done and predicted 
proteins (generated above) were masked. All pre- 
dictions were then searched using BLAST against 
all proteins from complete microbial genomes. An- 
notation was based on comparison to protein clus- 
ters and on the BLAST results. Conserved domain 
Database and Cluster of Orthologous Group infor- 
mation were then added to the annotation. 
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Table 2. Genome sequencing project information 



MIGS ID Property Term 





nillblllllg LjUdlliy 


Hirrh OiiQlit\/ V^rctii 
niUll v^Udlliy L/ldlL 


MIGD-28 


Libraries used 


Two genomic libraries, one lllumina paired-end library (500 bp insert- 
ed size); one lllumina mate-pair library (2 kb inserted size) 


MIGS-29 


Sequencing platform 


lllumina Hiseq 2000 


MIGS-31.2 


Sequencing coverage 


488 X 


MIGS-30 


Assemblers 


SOAPdenovo 1.05 version 


MIGS-32 


Gene calling method 


Glimmer and GeneMark 




GenBank Data of Release 


Pending 




NCBl project ID 


175950 




Project relevance 


Biotechnological 



Table 3. Genome Statistics 



Attribute 


Value 


% of total 


Genome size (bp) 


6,175,670 


100.00 


DNA coding region (bp) 


4,818,828 


78.03 


DNA G-hC content (bp) 


2,174,469 


35.21 


Number of scaffolds 


61 




Extrachromosomal elements 


> 300 kb 


> 4.86 


Total genes 


6,452 


100.00 


tRNA genes 


70 


1.08 


rRNA genes 


10 


0.16 


rRNA operons 


Q** 




Protein-coding genes 


6,372 


98.76 


Pseudo gene (Partial genes) 


0 (49) 


0 (0.76%) 


Genes with function prediction (proteins) 


4248 


66.67% 


Genes assigned to COGs 


4,334 


68.02% 


Genes with signal peptides 


437 


6.86 


CRISPR repeats 


0 


0 



**none of the rRNA operons appears to be complete due to unresolved assembly problems. 
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Genome Properties 

The high-quality draft assembly of the genome 
consists of 104 contigs in 61 scaffolds, with an 
overall 35.21% G+C content. Of the 6,452 genes 
predicted, 6,372 were protein-coding genes, and 
80 RNAs were also identified. The majority of the 
protein-coding genes (66.67%) were assigned a 
putative function while the remaining ones were 
annotated as hypothetical proteins [Table 3). The 
distribution of genes into COGs functional catego- 
ries is presented in Table 4. 



The whole genomic sequence and the coding se- 
quence of Sbt003 were analyzed by 
BtToxin_scanner [29], and eight potential crystal 
protein sequences were identified. Among these, 
four were considered to be full-length (locus tags: 
C797_02099, C797_12066, C797_12568 and 
C797_27783) while the others were considered to 
be truncated (Locus tags: C797_02094, 
C797_12046, C797_12061, C797_184173. 



Table 4 


. Number of genes associated with the general COG functional categories 


Code 


Value 


% age 


Description 


1 


224 


4.404 


Translation, ribosomal structure and biogenesis 


A 


0 


0.0 


RNA processing and modification 


K 


485 


9.536 


Transcription 


L 


374 


7.354 


Replication, recombination and repair 


B 


1 


0.020 


Chromatin structure and dynamics 


D 


48 


0.944 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0 


Nuclear structure 


V 


143 


2.812 


Defense mechanisms 


T 


225 


4.424 


Signal transduction mechanisms 


M 


254 


4.994 


Cell wall/membrane/envelope biogenesis 


N 


59 


1.160 


Cell motility 


z 


1 


0.020 


Cytoskeleton 


w 


1 


0.020 


Extracellular structures 


1 1 

u 


65 


1.2 78 


Intracellulartrafficking, secretion, and vesicular transport 


o 


122 


2.399 


Posttranslational modification, protein turnover, chaperones 


c 


215 


4.227 


Energy production and conversion 


G 


310 


6.095 


Carbohydrate transport and metabolism 


E 


480 


9.438 


Amino acid transport and metabolism 


F 


109 


2.143 


Nucleotide transport and metabolism 


H 


156 


3.067 


Coenzyme transport and metabolism 


1 


140 


2.753 


Lipid transport and metabolism 


P 


309 


6.076 


Inorganic ion transport and metabolism 


Q 


124 


2.438 


Secondary metabolites biosynthesis, transport and catabolism 


R 


783 


15.395 


General function prediction only 


S 


458 


9.005 


Function unknovi^n 




2038 


31.98 


Not in COGs 
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